A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts
نویسندگان
چکیده
Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.
منابع مشابه
Europe PMC: Quick tour
What is Europe PMC? Europe PMC [2] is a global, free, biomedical literature repository, providing access to worldwide life sciences articles, books, patents and clinical guidelines. The resource currently contains over 32 million abstracts and more than 4 million full-text articles (see Figure 1). A subset of the full-text information corpus is the open-access literature that can be downloaded ...
متن کاملAccessing Full Text of Articles: A Study on the Status of Medical Universities in Tehran
Introduction. Due to the rapid development of information technology and world wide web, there is easy and fast access to medical information and medical journals. Although there is free and easy access to articles' abstracts through Medline on the internet, accessing full text articles still remains a problem. This study was carried out to investigate the best way we could access full text of ...
متن کاملA Review of Methods for Assessing the Care Needs of Patients with Disabilities
Introduction: The care needs of patients with disabilities are often neglected or not fully identified. Knowing the different methods of assessment can help the care team to choose the best and most comprehensive method of assessing care needs. Methods: This scoping review study was designed and implemented with a 5-step Arksey & O'Malley approach. the search strategy was set using the keywords...
متن کاملBiomedical Literature Mining for Pharmacokinetics Numerical Parameter Collection
BIOMEDICAL LITERATURE MINING FOR PHARMACOKINETICS NUMERICAL PARAMETER COLLECTION Model-based drug studies have been developing very fast recently. They require high quality pharmacokinetics (PK) parameter numerical data. However, most parameter measurements are still buried in the scientific literature. Traditional manual data extraction is too expensive to handle the exponentially growing numb...
متن کاملLarge-Scale Event Extraction from Literature with Multi-Level Gene Normalization
Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked...
متن کامل